Statistical Physics of Self-Replication 



Jeremy L. England 
Department of Physics, Massachusetts Institute of Technology, 
Building 6C, 77 Massachusetts Avenue, Cambridge, MA 02139 
(Dated: September 7, 2012) 

Self-replication is a capacity common to every species of living thing, and simple physical intuition 
dictates that such a process must invariably be fueled by the production of entropy. Here, we 
undertake to make this intuition rigorous and quantitative by deriving a lower bound for the amount 
of heat that is produced during a process of self-replication in a system coupled to a thermal bath. 
We find that the minimum value for the physically allowed rate of heat production is determined by 
the growth rate, internal entropy, and durability of the replicator, and we discuss the implications 
of this finding for bacterial cell division, as well as for the pre-biotic emergence of self-replicating 
nucleic acids. 



Every living thing bears some resemblance to its ances- 
tors; this is a basic premise of biology. From the stand- 
point of physics, however, self-replication presents a chal- 
lenge. Reduced to its microscopic details, an organism 
cannot be distinguished from its environment: a priori, 
a fluctuating cluster of atoms does not "know" it is doing 
anything to affect the assembly of a similar-looking clus- 
ter. As biologists, we watch a bacterial cell divide and 
say that it facilitates its own duplication, but as physi- 
cists, plotting the course from one microstate to the next, 
we cannot attribute any more agency to the atoms of the 
bacterium than we can to the atoms in the sugar it eats; 
all we see is interacting particles, exploring a series of 
arrangements permitted by conservation of momentum 
and energy. 

To resolve this difficulty, we must recognize that the 
"self in self-replication is not anywhere implicit in the 
atomistic physical description of the system. Rather, it 
arises only once an observer carries out a further classi- 
fication of microstates by providing a definition for some 
pattern of interest. Such a coarse-graining of phase space 
should be familiar to any student of statistical mechan- 
ics, except that here, there need not be any way of phys- 
ically summarizing (with, for example, an order parame- 
ter) the function of microscopic variables used to define 
the coarse-graining. 

To make things explicit, we might imagine a thought- 
experiment in which we showed every possible microstate 
for some system to a microbiologist who was asked to des- 
ignate, in each case, how many live, healthy, wild- type E. 
coli bacteria were present. Though the microbiologist's 
assessment would be based partly or wholly on qualita- 
tive criteria, we would nevertheless come away from the 
procedure with a well-defined value for our cell count 
at each point in phase space. In this way, holistic, bio- 
logical judgments could be rendered into numbers that 
might then be incorporated into a quantitative model of 
the system's dynamics. 

Here, we will use a coarse-graining like the one 
sketched above to investigate the statistical physics of 
self-replication. Based on a general argument from mi- 



croscopic reversibility, we will derive a lower bound on 
the heat output of a self-replicator in terms of its size, 
growth rate, entropy, and durability. We will further- 
more show, through analysis of empirical data, that this 
bound operates on a scale relevant to the functioning of 
real microorganisms and other self-replicators. 

We begin by considering the preparation of a large, 
finite system initially containing a single E. coli bac- 
terium, immersed in a sample of rich nutrient media in 
contact with a heat bath held at the bacterial cell's opti- 
mal growth temperature (1//3 = T ~ 4.3 x 10 -21 joules) 
[TJ [2] . We can assume furthermore that the cell is in ex- 
ponential growth phase at the beginning of its division 
cycle, and that, while the volume and mass of the entire 
system are held fixed, the composition and pressure of the 
nutrient media mimics that of a well-oxygenated sample 
open to the earth's atmosphere. If we summarize the ex- 
perimental conditions described above with the label I, 
we can immediately say that there is some probability 
p(i\T) that the system is found in some particular mi- 
crostate i given that it was prepared in the macroscopic 
condition I by some standard procedure. Although this 
probability might well be impossible to derive ab initio, 
in principle it could be measured through repeated con- 
sultations of a microbiologist as described above. 

Now suppose we consider what would happen in our 
system if we started off in some microstate i and ob- 
served it again after a time interval of r^, the typical 
duration of a single round of growth and cell division. 
From the biological standpoint, the expected final state 
for the system is clear: two bacteria floating in the media 
instead of one, and various surrounding atoms rearranged 
into new molecular combinations (e.g. some oxygen con- 
verted into carbon dioxide). While very likely, however, 
such an outcome is not certain, and in general we have 
to consider that each microstate j will have some finite 
likelihood p(j\U). Since our system is coupled to a heat 
bath, it obeys stochastic dynamics described by the tran- 
sition matrix 7r(^ j\i), which is the conditional probabil- 
ity of ending up in microstate j (with energy Ej) at time 
t = Tdi V given that one started out in microstate i (with 
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energy Ei) at time t = 0. In these terms, our ensemble 
II of possible arrangements after time r^ v can be denned 
via 



p(j|II) = J^di p(i\l)ir(-^ j\i), 



(1) 



which we take to have the normalization J dj p(j\U) = 1. 

By stipulation, there are no external driving forces 
acting on the system, and it can furthermore be as- 
sumed that the changes in our bacterial incubator over 
the course of time Td% v are dominated by diffusive mo- 
tions that lack any sense of momentum [3]. Thus, the 
transition matrix 7r(—>j\i) must obey a detailed balance 
relation 



: exp[-/3AQ*_^] 



(2) 



where AQi^j = Ei — Ej is the heat released into the 
surrounding bath in the course of going from i to j. This 
follows from the underlying microscopic reversibility of 
particle dynamics in our system, along with the fact that 
the thermal equilibrium distribution p(i) oc exp[—/3Ei] 
must be stationary under 7r(—>j\i) [4]. 

To appreciate the thermodynamic consequences of the 
assumptions we have already made, it is necessary to 
consider the reverse probability 



7r(- 



I|II) = jfdi j dj p(j\II)w(^ i 



(3) 



that is, the minuscule probability that the system returns 
to one of the microstates satisfying the macroscopic con- 
dition I in time Tdiv given that we start out with an ini- 
tial distribution over microstates of p(j\U). Substituting 
from eq. (J2|, we can rearrange this quantity to obtain 



tt(^ I|II) 



di / dj 



P (i|i) 



p(i\I)n(-^ i\j) = di dj p(i\T)7r(-> j\i) 



r p(j|") i 
L P (i|i) J 



> L p(*|i) J 



(4) 



I^II 



where (. . .)i^n denotes an average over all paths from 
some i in the initial ensemble I to some j in the final 
ensemble II, with each path weighted by its likelihood. 

Defining the Shannon entropy S for each ensemble in 
the usual manner (S = — ^p^lnp^), we can construct 
ASi n t = Su — Si, which measures the internal entropy 
change for the replication reaction. Since it is generally 
the case that e x > 1 + x for all x, we may rearrange (4) 
to write 

/ -^AQi^-lnTrC^III^-lnpO-II^+lnpCill)^ = 1 (&) 

\ I I^II 

and immediately arrive at 

/3(AQ) + In [tt(^ I|II)] + AS int > (6) 

In one sense, this result simply says what we might have 
guessed: that the average total entropy production for 
the forward process (AS tot) = ASi nt + /3(AQ) sets a 
bound on how likely things would be to run in reverse: 
since the probability 7r(^ I|H) < 1, it follows that 
(AS tot) ^ 0- Put another way, what we have here is 
simply a precise statement of the Second Law of Ther- 
modynamics. It should therefore perhaps not be surpris- 
ing to learn that the formula applies under conditions 
more general than those for which it was derived: in any 
diffusive, microscopically reversible system driven from 
equilibrium by a time-symmetric drive, it can be shown 
that tt(^ j\i) = (exp[-^AQi^])i^ by using 

the irreversibility formula derived by Crooks [3]. Thus, 
although we have prefaced this investigation with a dis- 
cussion of self-replication, our bound holds for a range of 



driven, nonequilibrium transitions between ensembles; in 
this light, we can see that the result is closely related to 
the well-known Landauer bound for the heat generated 
by the erasure of a bit of information [5] . 

Having established a general thermodynamic con- 
straint on how self-replication can proceed, we now must 
consider whether or our finding is relevant in particular 
cases of interest. For the process of bacterial cell division 
introduced above, our ensemble II is a bath of nutrient- 
rich media containing two bacterial cells in exponential 
growth phase at the start of their division cycles. In order 
to make use of the relation in (J6|, we need to estimate 
the likelihood that after time r^, we will have ended 
up in an arrangement I where only one, newly formed 
bacterium is present in the system and another cell has 
somehow been converted back into the food from which 
it was built. Our first hint that this likelihood must be 
quite small comes from our knowledge of the system's 
biology: if we start with two cells and wait a full divi- 
sion time, we will almost certainly end up with four! The 
challenge before us is therefore to quantify the extreme 
unlikelihood of the system doing something else. 

The first piece is relatively easy to imagine: while 
we may not be able to compute the exact probability 
of a bacterium fluctuating to peptide-sized pieces and 
de-respirating a certain amount of carbon dioxide and 
water, we can be confident it is less likely than all the 
peptide bonds in the bacterium spontaneously hydrolyz- 
ing. Happily, this latter probability may be estimated 
in terms of the number of such bonds n pepi the division 
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FIG. 1. A single bacterium in ensemble I is most likely to 
grow and divide (green arrows) as it progresses to ensemble II 
(panel A). As panel B depicts, it is possible that a transition 
from II to I could be achieved by the spontaneous disinte- 
gration of one bacterial cell (orange arrows) accompanied by 
the spontaneous pausing of the growth of another one (red 
arrow). However, it is arguable that the scenario displayed 
in panel C is more likely. Here, one cell disintegrates, while 
another one divides into two cells, one of which then subse- 
quently disintegrates. 

time Tdiv, and the peptide bond half-life Th y d- Assuming 
Poissonian statistics and a large value for n pepi we have 

In-Phyd — n pep ln[Tdi v /(n pep Thyd)]- 

Handling the cell that stays alive is more challenging, 
as we have assumed this cell is growing processively, and 
we ought not make the mistake of thinking that such a 
reaction can be halted or paused (Fig. IB) by a small 
perturbation. The onset of exponential growth phase is 
preceded in E. coli by a lag phase that can last several 
hours [lj, during which gene expression is substantially 
altered so as to retool the cell for rapid division fueled 
by the available metabolic substrates [6 . It is therefore 
appropriate to think of the cell in question as an opti- 
mized mixture of components primed to participate in 
irreversible, forward reactions like nutrient metabolism 
and protein synthesis. 



We can therefore argue that the likelihood of a spon- 
taneous, sustained pause (of duration r div ) in the pro- 
gression of these reactions is very small indeed: if each 
enzymatic protein component of the cell were to reject 
each attempt of a substrate to diffuse to its active site 
(assuming a diffusion time of small molecules between 
proteins of r di ff ~ 10 -8 sec [7J [8]), we would expect 
\^P P ause\ oc \n pep {r div I Wdif f)\ to exceed \ \np hyd \ by or- 
ders magnitude. We must, however, consider an alter- 
native mechanism for the most likely II — » I transition 
(Fig. 1C): it is possible that a cell could grow and divide 
in an amount of time slightly less than Tdiv [2 - If, sub- 
sequent to such an event, the daughter cell of the recent 
division were to spontaneously disintegrate back into its 
constituent nutrients (with log-probability at most on the 
order of n pep ln[r^ / \n pep Th y d)\) ? we would complete the 
interval of Tdiv with one, recently divided, processively 
growing bacterium in our system, that is, we would have 
returned to the I ensemble. Thus, via a back-door into 
I provided to us by bacterial biology, we can claim that 
that 

ln7r(^ I|II) < 2\np hyd ~ 2n pep ln[r div / (n pep r hyd )} (7) 

Having obtained the above result, we can now refer 
back to the bound we set for the heat produced by this 
self-replication process and write 

f3(Q) > 2n pep \n[(n pep T hyd )/r div ] - AS int (8) 

This relation demonstrates that the heat evolved in the 
course of the cell making a copy of itself is set not only 
by the decrease in entropy required to arrange molecular 
components of the surrounding medium into a new or- 
ganism, but also by how rapidly this takes place (through 
the division time r div ) and by how long we have to wait 
for the newly assembled structure to start falling apart 
(through Thy d )- Moreover, we can now quantify the ex- 
tent of each factor's contribution to the final outcome, 
in terms of n pepi which we estimate to be 1.6 x 10 9 , as- 
suming the dry mass of the bacterium is 0.3 picograms 

The total amount of heat produced in a single divi- 
sion cycle for an E. coli bacterium growing at its maxi- 
mum rate on lysogeny broth (a mixture of peptides and 
glucose) is /3(Q) = 220n pep pQ. We expect the largest 
contributions to the internal entropy change for cell divi- 
sion to come from the equimolar conversion of oxygen to 
carbon dioxide (since carbon dioxide has a significantly 
lower partial pressure in the atmosphere), and from the 
confinement of amino acids floating freely in the broth to 
specific locations inside bacterial proteins. We can esti- 
mate the contribution of the first factor (which increases 
entropy) by noting that ln(vco 2 / v o 2 ) ~ 6. The liber- 
ation of carbon from various metabolites also increases 
entropy by shuffling around vibrational and rotational 
degrees of freedom, but we only expect this to make 
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some order unity modification to the entropy per car- 
bon atom metabolized. At the same time, peptide an- 
abolism reduces entropy: by assuming that in 1% tryp- 
tone broth, an amino acid starts with a volume to ex- 
plore of Vi = 100 nm 3 and ends up tightly folded up in 
some Vf = 0.001 nm 3 sub- volume of a protein, we obtain 
hi(v f/vi) ~ —12. In light of the fact that the bacterium 
consumes during division a number of oxygen molecules 
roughly equal to the number of amino acids in the new 
cell it creates [TJ [10] , we can arbitrarily set a generous 
upper bound of —ASi n t < 10n pep . 

In order to compare this contribution to that of the 
irreversibility term in (J6|, we assume a cell division time 
of 20 minutes [TJ [2] , and a spontaneous hydrolysis life- 
time for peptide bonds of 600 years at physiological pH 
[TQ, which yields 2n pep ln[r div /(n pep r hyd )} = 1.2 x 10 11 ~ 
75n pe p, a quantity at least several times larger than 
ASi n t. Thus, as we may have expected based on previous 
evidence [12 , the entropic cost for aerobic bacterial respi- 
ration is relatively small, and is substantially outstripped 
by the sheer irreversibility of the self-replication reaction 
as it churns out copies that do not easily disintegrate into 
their constituent parts. 

More significantly, these calculations also establish 
that the E. coli bacterium produces an amount of heat 
less than three times as large as the absolute physical 
lower bound dictated by its growth rate, internal entropy 
production, and durability. In light of the fact that the 
bacterium is a complex sensor of its environment that can 
very effectively adapt itself to growth in a broad range of 
different environments, we should not be surprised that 
it is not perfectly optimized for any given one of them. 
Rather, it is remarkable that in a single environment, the 
organism can convert chemical energy into a new copy of 
itself so efficiently that if it were to produce even half 
as much heat it would be pushing the limits of what is 
thermodynamically possible! This is especially the case 
since we deliberately underestimated the reverse reaction 
rate with our calculation of Phyd, which does not account 
for the unlikelihood of spontaneously converting carbon 
dioxide back into oxygen. Thus, a more accurate esti- 
mate of the lower bound on P(Q) in future may reveal 
E. coli to be an even more exceptionally well-adapted 
self-replicator than it currently seems. 

We have seen how this argument plays out for a bac- 
terium, but the approach applies equally in a broad range 
of cases where there is some reliable stochastic model 
of a replicator's population dynamics [T3. For exam- 
ple, a recent study has used in vitro evolution to opti- 
mize the growth rate of a self-replicating RNA molecule 
whose formation is accompanied by a single backbone 
ligation reaction and the leaving of a single pyrophos- 
phate group [14 . With a doubling time of 1 hour, a 
half-life for RNA of 4 years [15] , and the reasonable as- 
sumption (in this case) that the change in translational 
entropy is negligible, we can estimate the heat bound as 



(Q) > RTln[(4 years) /(l hour)] = 7 kcal mol" 1 . Since 
experimental data indicate an enthalpy for the reaction 
in the vicinity of 10 kcal mol -1 [I6j [17], it would seem 
this molecule operates quite near the limit of thermody- 
namic efficiency set by the way it is assembled. 

To underline this point, we may consider what the 
bound might be if this same reaction were somehow 
achieved using DNA, which is much more kinetically 
stable against hydrolysis than RNA [T8] , In this case, 
we would have (Q) > RT ln[(3 x 10 7 years)/(l hour)] = 
16 kcal mol -1 , which exceeds the estimated enthalpy for 
the ligation reaction and is therefore prohibited ther- 
modynamically. This calculation illustrates a signifi- 
cant difference between DNA and RNA, regarding each 
molecule's ability to participate in self-catalyzed repli- 
cation reactions fueled by simple triphosphate building 
blocks: the far greater durability of DNA demands that 
a much higher per-base thermodynamic cost be paid in 
entropy production [19] in order for for the growth rate 
to match that of RNA in an all-things-equal compari- 
son. Moreover, the heat bound difference between DNA 
and RNA should increase roughly linearly in i, the num- 
ber of bases ligated during the reaction, which forces the 
maximum possible growth rate for a DNA replicator to 
shrink exponentially with t in comparison to that of its 
RNA equivalent. This observation is certainly intriguing 
in light of past arguments made on other grounds that 
RNA, and not DNA, must have acted as the material for 
the pre-biotic emergence of self-replicating nucleic acids 

nana. 

The process of cellular division, even in a creature as 
ancient and streamlined as a bacterium, is so bewilder- 
ingly complex that it may come as some surprise that 
physics can make any binding pronouncements about 
how fast it all can happen. The reason this becomes 
possible is that time-symmetrically driven, nonequilib- 
rium processes in constant temperature baths obey gen- 
eral laws that relate forward and reverse transition prob- 
abilities to heat production [3]. Previously, such laws 
had been applied successfully in understanding thermo- 
dynamics of copying "informational" molecules such as 
nucleic acids [13 . In those cases, however, the informa- 
tion content of the system's molecular structure could 
more easily be taken for granted, in light of the clear role 
played by DNA in the production of RNA and protein. 
What we have glimpsed here is that the underlying con- 
nection between entropy production and transition prob- 
ability has a much more general applicability, so long 
as we recognize that "self-replication" is something that 
happens relative to an observer: only once a classification 
scheme determines how many copies of some object are 
in the system for each microstate can we talk in proba- 
bilistic terms about the general tendency for that object 
to affect its own reproduction, and the same system's 
microstates can be classified using any number of dif- 
ferent schemes. We may hope that this insight spurs fu- 
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ture work that will clarify the general physical constraints 
obeyed by natural selection in nonequilibrium systems. 

The author thanks C. Cooney, J. Gore, A. Grosberg, 
D. Sivak, and A. Szabo for helpful comments. 
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